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SILICON-BASED STORAGE VIRTUALIZATION SERVER 



CROSS-REFERENCES TO RELATED APPLICATIONS 
[01] This application claims priority to U.S. Provisional Application No. 60/268,694, filed 
5 February 13, 2001 and titled "Virtual Storage Systems", which is incorporated herein by 
reference. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER 
FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
10 [02] NOT APPLICABLE 

S REFERENCE TO A "SEQUENCE LISTING," A TABLE, OR A COMPUTER 

>[ PROGRAM LISTING APPENDIX SUBMITTED ON A COMPACT DISK. 

m [03] NOT APPLICABLE 

4 

BACKGROUND OF THE INVENTION 
iy [04] The present invention relates to storage area network (SAN) systems. 
H [05] Storage area networks, or SANs, are networks of storage subsystems connected to 
□ servers. The goal of a SAN is to allow multiple servers of different operating systems (Unix, NT) 
2d to each "see" one large data repository. SANs provide four key benefits over direct attached 
storage: reduced utilization (increased bandwidth) on the Local Area Network and increased 
reliability, manageability, and scalability. 

[06] A current trend in SANs is storage virtualization. Storage virtualization describes the 
process of representing, to a user, a number of discrete physical storage devices as a single 
25 storage pool having a single set of characteristics. For example, in a storage area network 

connecting host computers with storage devices, the user perceives a single block of disk space 
with a defined reliability (e.g., 100 GB at RAID1); however, the user's host computer is 
configured to access the storage devices such that 100 GB at RAID1 is provided, regardless of 
whether the data is stored on a single RAID1 disk array or is split across multiple, separate disks. 



[07] Most of the solutions available in the marketplace today to virtualize SAN are software 
based. There are solutions that are host based, storage based and SAN based. For host based 
solution, each host computer must be aware of the storage devices connected to the storage area 
network because each host computer manages the storage virtualization that is presented to its 

5 users. When the storage devices connected to the storage area network are modified (such as a 
new device being added or an existing device being removed), each host computer must be 
reconfigured to accommodate the modification. Such reconfiguration involves work by network 
administrators and are error prone. The storage based solutions have similar issues. The SAN 
based solutions are better than the host and storage based solutions but lack scalability and 

10 performance. 

[08] The present invention is directed toward improvements in this and other areas, 
g BRIEF SUMMARY OF THE INVENTION 

y m [09] According to one embodiment of the present invention, a storage server in a storage area 
1ft network connects host computers and storage devices. The storage server includes storage 
2 processors interconnected by a switching circuit. The storage server also includes a processor 
is that configures a path between two storage processors based on a command packet. Data is then 
:7l routed on the path more quickly than in many existing systems. In one embodiment, routing tags 
^ are associated with the data packets, the storage processors examine the routing tags without 
20 having to wait until the entire data packet is received, and one storage processor begins routing 
; ¥ data packets to another storage processor in accordance with the routing tags without having to 
examine or receive the entire data packet. 

[10] A fuller understanding of the present invention may be obtained by reference to the 
following drawings and related detailed description. 

25 

BRIEF DESCRIPTION OF THE DRAWINGS 
[11] FIG. 1 is a block diagram of a storage area network including a storage server according 
to an embodiment of the present invention; 

[12] FIG. 2A is a block diagram of hardware components in the storage server according to an 
30 embodiment of the present invention; 



2 



[13] FIG. 2B is a block diagram of management functions in the storage server according to 
an embodiment of the present invention; 

[14] FIG. 3 is a diagram showing the relationship between PLUNs, media units and VLUNs 
according to an embodiment of the present invention; 
5 [15] FIG. 4 is a block diagram showing upstream and downstream components according to 
an embodiment of the present invention; 

[16] FIG. 5 is a flow diagram showing command and data processing according to an 
embodiment of the present invention; 

[17] FIG. 6 is a flow diagram showing the processing of the tree search engine according to an 
1 0 embodiment of the present invention; 

[18] FIG. 7 is a data diagram of routing information according to an embodiment of the 
present invention; 

|3 [19] FIG. 8 is a data diagram of a command frame according to an embodiment of the present 
y invention; 

ifl [20] FIG. 9 is a data diagram of the tags field in the command frame according to an 
31 embodiment of the present invention; 

p [21] FIG. 10 is a flow diagram of read command processing according to an embodiment of 
[ y the present invention; 

y [22] FIG. 1 1 is a flow diagram of write command processing according to an embodiment of 
M the present invention; 

[23] FIG. 12 is a block diagram of the modules in the picocode according to an embodiment 

of the present invention; 

[24] FIG. 13 is a flow diagram of command frame header manipulation in a read operation 
according to an embodiment of the present invention; and 
25 [25] FIG. 14 is a flow diagram of command frame header manipulation in a write operation 
according to an embodiment of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
[26] The detailed description is organized as follows. First, an overview is given of the 
30 overall system implementing the present invention. Second, a high-level description is provided 
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of the features of the present invention. Finally, supporting low-level details are provided that 
further detail the features of the present invention. 

[27] OVERVIEW 

5 [28] FIG. 1 shows a storage server 100 according to an embodiment of the present invention. 
The figure also shows a storage area network (SAN) 102, a number of physical storage devices 
104, and a number of host computers 106. 

[29] The storage server 100 is also referred to as a Virtual Storage Exchange (VSX) or 
Confluence Virtual Storage Server (CVSS), and is further detailed in FIG. 2A. The storage 
10 server 100 provides storage virtualization to servers in a homogeneous as well as a 

heterogeneous environment, providing a solution to large data centers, ISPs, SSPs, and ASPs in 
the area of network storage. 

□ [30] The SAN 102 can be any type of computer network. It is referred to as a storage area 
'C\ network in the present application because that is its relevant function with respect to the 

lSjj embodiments of the present invention. In an embodiment of the present invention, the SAN 102 
ik is a Fibre Channel network, the host computers 106 and the storage devices 102 are configured 
y 5 to communicate with a Fibre Channel network, and the storage server 100 is also configured to 

□ communicate with a Fibre Channel network. Thus, the storage server 100 can be easily added to 
\1 an existing SAN. 

ijf [31] The physical storage devices 104 include tape drives, disk arrays, JBODs ("just a bunch 
ilj of disks"), or other types of data storage devices. The physical storage devices 104 can be 

connected directly to the host computers 106 via the SAN 102 or can be indirectly connected to 
the host computers 106 via the SAN 102 and the storage server 100. As discussed above in the 
Background, management of storage virtualization is burdensome when the storage devices 104 
25 are directly connected to the host computers 106 via the SAN 102. The present invention 
improves management of storage virtualization by using the storage server 100 to indirectly 
connect the storage devices 104 to the host computers 106. 

[32] The host computers 106 can be servers or stand-alone computers. The host computers 
106 can be directly connected to the SAN 102 or indirectly connected via a switch, router, or 
30 other communication link. 
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[33] FIG. 2A is a block diagram of the storage server 100 showing the hardware components 
related to embodiments of the present invention, including a storage processor 110, a line card 
112, a virtual server card 114, and a switch fabric 116. 

[34] The storage server 100 may include one or more storage processors 1 10. The storage 
5 processors 1 10 process the storage commands and data to be stored as information flows 
between the host computers 106 and the storage devices 104. One or more of the storage 
processors 110 may be included on each line card 112. The storage server 100 includes space for 
numerous line cards 112, so the capabilities of the storage server 100 can be modularly increased 
by adding more line cards 1 12 or more storage processors 1 10. Each storage processor 1 10 is 
10 associated with one or more ports of the storage server 100. 

[35] The storage server 100 may include one or more virtual server cards 1 14. The virtual 
server cards control the operation of the storage server 100 and control the line cards 1 12, which 
□ perform the actual work of transferring commands and data. 

■~1 [36] The switch fabric 116 connects the storage processors 110. The switch fabric switches 
15l information received at one port to another port of the storage server 100. For example, when a 
:|l host computer 106 wants to read data stored on the storage area network 102, its request is 
y 1 processed by the storage processor 110 associated with the port associated with that host 
Q computer 106. That storage processor 1 10 is referred to as the upstream storage processor 1 10. 
\2 The upstream storage processor 110 communicates with a downstream storage processor 1 10 
2gf associated with the port associated with the storage device 104 storing the data to be read, via the 
Ill switch fabric 116. Then the switch fabric 116 transfers the data read from the storage device to 
the host computer 106, via the downstream and upstream storage processors 110. 
[37] FIG. 2B is a block diagram of the storage server 100 showing the functionality relevant to 
embodiments of the present invention. The functions of the storage server 100 may be 
25 implemented by one or more processors that execute processing according to one or more 
computer programs, microcode segments, hardware structures, or combinations thereof. The 
functions relevant to the present invention are the media unit (MU) manager 120, the virtual 
logical unit number (virtual LUN or VLUN) manager 122, and the physical logical unit number 
(physical LUN or PLUN) manager 124. Additional details of the storage server 100 are provided 
30 in other applications assigned to the present assignee and filed on February 13, 2002 that claim 
the benefit from the above-noted Provisional Application No. 60/268,694 and are hereby 
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incorporated herein by reference as follows: Attorney Docket No. 20949P-000300US titled 
"Storage Visualization and Storage Management to Provide Higher Level Storage Services", 
Attorney Docket No. 20949P-000500US titled "Method and Apparatus for Identifying Storage 
Devices", Attorney Docket No. 20949P-000600US titled "System and Method for Policy Based 
5 Storage Provisioning and Management", Attorney Docket No. 20949P-000700US titled "Virtual 
Data Center", Attorney Docket No. 20949P-000800US titled "Failover Processing in a Storage 
System", Attorney Docket No. 20949P-000900US titled "RAID at Wire Speed", and Attorney 
Docket No. 20949P-001000US titled "Method for Device Security in a Heterogeneous Storage 
Network Environment". 

10 [38] The PLUN manager 124 manages data and command transfer to and from the storage 
devices 104. Each storage device 104 may have associated therewith a PLUN that is used for 
\a identifying each particular storage device 1 04. 

!2j [39] The VLUN manager 122 manages data and command transfer to and from the host 
\I computers 106. Each host computer 106 may be associated with one or more VLUNs. Each 
lft VLUN represents a virtual address space (e.g., gigabytes of storage) with defined attributes (e.g., 
Jf performance parameters, reliability level, etc.). As such, each host computer 106 exchanges data 

» and commands with the storage server 1 00 with reference to a particular VLUN. 

□ 

;il [40] The MU manager 120 basically translates between VLUNs and PLUNs. The MU 
^ manager 120 is responsible for managing the address space of all the storage devices 104 
2§j (physical LUNs) connected to the storage server 100. The MU manager 120 also manages the 
u address space of the storage constructs built within the storage server 100, including slices, 
concatenations, RAIDO (stripes) and RAID1 (mirrors). 

[41] The MU manager 120 uses an abstract block-storage addressing technique that enables 
address spaces to be treated in a logical manner, regardless of the underlying storage constructs 
25 or physical LUNs. These logical address spaces can be combined together into more complex 
and feature rich storage constructs, which are also treated simply as abstract block-storage 
address spaces. 

[42] Used in conjunction with a virtual LUN, these logical address spaces can be configured 
to appear as LUNs on a multi-ported storage device. This process of presenting physical LUNs 
30 as logical address spaces on virtual devices is referred to as storage virtualization. 
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[43] Abstract block-storage addressing is achieved via a data structure known as a media unit 
(MU). 

[44] FIG. 3 shows the relationship between physical media units and the other services. The 
PLUN manager 124 manages PLUNs, the MU manager 120 manages media units, and the 
5 VLUN manager 122 manages VLUNs. 

[45] In addition, FIG. 3 shows the relationships between PLUNs, media units, and VLUNs. 
Generally, a PLUN directly corresponds to a storage device, such as a disk or a disk array. Such 
a direct one-to-one is relationship generally shown in the following figures. However, a PLUN 
can also be associated with a portion of the storage device. Multiple PLUNs can be associated 
10 with different portions of a single storage device. 

[46] Each physical media unit (first-level media unit) generally directly corresponds to a 
single, respective PLUN. 
q [47] Each VLUN is generally associated with a single, respective media unit 

,/T [48] The following sections further describe some aspects of the present invention. 

M 

m 

S [49] HIGH-LEVEL DESCRIPTION 

^ [50] According to one embodiment, the storage area network is a Fibre Channel network and 
IV the storage server 100 includes a number of Fibre Channel ASICs that convert Fibre Channel 
H frames into a second format for internal processing by the storage server 1 00. These Fibre 
2© Channel ASICs are further described in the Provisional Application No. 60/3 1 7,8 1 7. 

[51] FIG. 4 is a block diagram showing upstream and downstream storage processors 110 and 
Fibre Channel ASICs 140. The term "upstream" is used to refer to the components closest to the 
host computers 106, and the term "downstream" is used to refer to the components closest to the 
storage devices 104. According to one embodiment, each storage processor 110 uses a Fibre 
25 Channel ASIC to connect four 1 GB/s Fibre Channel ports. 

[52] FIG. 5 is a flowchart of a read process 200 according to an embodiment of the present 
invention. In general, read and write traffic is handled by the storage processors 110. 
Specialized hardware (referred to as microengines) in the storage processors may be used to 
implement these processing steps. These storage processors are generally referred to as the 
30 virtualization engine. Non-read/write commands may be handled by an embedded CPU. 
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[53] In step 202, a host computer 1 06 sends a read command to the storage server 100 through 
the storage area network 102. This read command may be in the form of a command packet 
(also referred to as a command frame). The read command arrives at the storage server 100 and 
is processed by the upstream Fibre Channel ASIC. 

5 [54] In step 204, the command arrives at the upstream storage processor. The command 

includes what is referred to as a command handle. The upstream storage processor looks for the 
host LU from the command handle using the tree search engine. From the host LU, the upstream 
storage processor finds the virtual unit and starts decomposing the request into physical units. 
[55] The tree search engine looks up the host LU in a lookup table. The lookup table contains 

10 virtualization information; that is, information that relates the virtual storage space (that the host 
computers see) to the physical storage space (that may be provided by multiple physical disks). 

y. The lookup table is programmed by the VSC 114 (see FIG. 2A) using configuration commands. 

y [56] In step 206, the upstream storage processor passes a handle via the switching circuit to 

\f the downstream storage processor to identify the device to talk to. 

lM [57] In step 208, the handle arrives at the downstream storage processor. 

jU" [58] In step 210, the downstream storage processor, from the handle passed in, sends the 
command to the correct physical disk. The downstream storage processor sends the command 
along with routing tags and a storage processor command handle. The routing tags and SP 

\— command handle are used when a data frame (or data packet) returns. The above steps 206-210 
2g in effect configure a path between the upstream and the downstream storage processors. 

! y [59] In step 2 1 2, the physical disk sends a data frame (data packet) back through the storage 
area network to the storage server 100. 

[60] In step 214, the downstream Fibre Channel ASIC receives the data frame. The 
downstream Fibre Channel ASIC performs exchange management and looks up the command 
25 context. 

[61] In step 216, the downstream Fibre Channel ASIC sends the data frame along with the 
routing tags and a SP command handle. 

[62] In step 218, the downstream storage processor receives the data frame. The routing tags 
allow the downstream storage processor to route the frame to the upstream storage processor (via 
30 the switching circuit) even before the entire packet arrives. According to one embodiment, the 
first 64 bytes of the data frame are inspected before the full payload arrives. 
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[63] In step 220, the data packet arrives at the upstream storage processor (via the switching 
circuit). The upstream storage processor references the data context and sends the data frame out 
along with the corresponding Fibre Channel ASIC command handle to allow it to get command 
context. 

5 [64] In step 222, the upstream Fibre Channel ASIC performs exchange management (using 
the command handle) and sends out the data frame to the host computer 1 06 via the storage area 
network 102. 

[65] Although FIG. 4 is particularly directed toward a read process, a write process involves 
similar steps. 

10 [66] According to another embodiment, the storage server 1 00 may generate the commands 
internally. Commands can be generated internally when data is to be transferred from one 
storage device to another, or when data is to be duplicated or reconstructed on a second storage 
□ device. 

£j [67] According to another embodiment, the host computer and the storage device may be 
1 51 connected to the same storage processor. In such a case, the upstream and downstream storage 
5 «. processors are the same storage processor. 
* E [68] More extensive details of these processes follow. 

•= 

,a=r: : 

2 [69] LOW-LEVEL DETAILS 

2j§j [70] As discussed above, the storage server 100, also referred to as the Virtual Storage 
m Exchange (VSX), includes multiple Storage Processors (SPs) inter-connected by redundant 
switching circuits. The VSX also has at least one CPU for configuration and management of 
these SPs. The CPU also provides higher level storage services. 

[71] The SCSI processing is handled by the SPs. Each SP has the following components: 
25 [72] 1.16 micro-engines that handle the Read/Write Commands; 

[73] 2. An embedded CPU that handles all the non-Read/Write Commands including Error 
Recovery; 

[74] 3. A Hardware Classifier that identifies a frame type; 

[75] 4. A Dispatch Unit to enqueue a frame to a correct microcode handler running in the 
30 micro-engines; 

[76] 5. A Tree Search Engine to find a leaf entry for a given pattern; 
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[77] 6. A Counter Engine, which is a coprocessor that allows statistics collection (up to 4M 
counters in one embodiment); and 

[78] 7. A Switch interface that connects the SP to the switching circuit (also referred to as the 
switch fabric), 

5 [79] The Read/Write Commands are processed by the Micro-engines. The RD/WR command 
processing is further described below. 

[80] A Command is received at the upstream SP and the upstream SP does most of the 
processing of the command. It authenticates the command/access controls, determines where to 
ship the command, calculates the new start logical block address (LB A) and request blocks, 
10 builds the new command data block (CDB) and ships it to the downstream SP. If there is only 
one path to get to the target device, the upstream SP builds a completed command, which will 
then be forwarded to the target device via downstream SP and FC-ASIC. If the downstream SP 
□ has several paths to get to the target device, the upstream SP leaves up to the downstream device 
r§ to choose the access path. In that case, the downstream SP will fill in the appropriate 
15f information about the access path and forward it to the target device via FC-ASIC. 

ill 

iff [81] When a command frame enters the SP, the ingress command handler (running in micro- 

! S S 

~ z engines) will be called. The command handler (CmdHandler) allocates an 10 Control Block 
O (IoCB) structure to store the command context and to keep track of the state of the command. 
\2 From the command information, the CmdHandler constructs a search-key that includes SP port, 
2& FC-ASIC Device Handle, and FC LUN (logical unit number). The search-key is passed into SP 
jil Tree Search Engine (TSE) to search for the hardware LUN (HLUN) associated with the 

command. If the search fails, the command will be rejected due to non-existing LUN; otherwise, 
the command processing will be continued. HLUN is a structure that ties the server and a virtual 
LUN (VLUN); therefore, the associated VLUN structure can be retrieved via HLUN 
25 information. 

[82] Based on the start LBA/number of blocks requested in the received command and the 
VLUN information, the CmdHandler decomposes the received command to a set of physical 
commands (the set might be one or more commands depending on the aforementioned 
information). If more than one physical command (pCmd) are decomposed, each pCmd has its 
30 own IoCB (referred to as a child IoCB or cloCB) that is used to store its own command context. 
These cloCBs are linked to the original IoCB (referred to as a master IoCB or mloCB). 



10 



Thereafter, the CmdHandler builds these commands with their physical start LB As and numbers 
of blocks that are mapped to the physical target devices. These commands will then be sent to 
the downstream SPs that directly connect to the target devices. The reference to IoCB is also 
passed between upstream SP and downstream SP as command handles (upstream command 
5 handle and downstream command handle) that will be used to locate the IoCB associated to the 
command. 

[83] As mentioned earlier, the downstream SP might have more than one access path to the 
target device. If there is a single access path, pDevpath key is passed from upstream SP to 
downstream SP; otherwise, pLUN key is passed. In multi-path scenario, the downstream 
10 CmdHandler searches for the physical LUN (PLUN) and chooses an access path, pDevpath. 

This leads to another search. In the single path scenario, the downstream CmdHandler searches 
directly pDevpath to get the essential information to access the target device, 
p [84] FIG. 6 shows high-level flow diagrams of read/write command processing. In the 
IrJ upstream path, the tree search engine (TSE) indicates the HLUN, VLUN, PLUNup and 
151 pDevPath. 

J [85] In the downstream path, first the MPATH_BIT is checked. If the MPATHJBIT is clear, 
y - then the downstream PLUN does not have multiple paths. The downstream SP will then issue a 
□ search to the pDevPath table. If the MPATHJBIT is set, the search will be done on the PLUN 
u table. The PLUN leaf will have all the possible paths to the storage. 
2$ [86] FIG. 7 shows a data diagram of routing information 310 (see also FIGS. 8 and 9) 
m according to an embodiment of the present invention. The routing information 310 is used 
between the FC ASIC and the SP. 

[87] Upon receiving a command, the command handler bases on the command information 
and the programmed VLUN information to determine the appropriate access path. Once the 

25 access path has been determined, all subsequent frames (transfer ready, data, and status) are 
forwarding on the same path; therefore, the microcode adds a routing information field that is 
used to speed up the frame routing process. The routing information is imbedded in the frame. 
It allows SP microcode to get the routing information directly from the received frame without 
looking up the IoCB structure first. Since the routing information is within the first 64 bytes of 

30 the frame, picocode can start looking at how to route the frame. This improves performance. 
[88] Routing information includes the following items: 
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[89] 1 . Blade Number: the target SP switch blade number; 

[90] 2. QID: an encrypted value that is used by SP ENQ coprocessor to route the frame to an 
SP port; 

[91] 3. FCAPort: the FC ASIC Port number; 
5 [92] 4. DMU: the Data Mover Unit that is used by SP ENQ coprocessor to route the frame to 
an SP port; and 

[93] 5. DSU: the Data Store Unit that is used by SP ENQ coprocessor to route the frame to an 
SP port. 

[94] The routing information field consists of 2 parts: DST and SRC. The FC ASIC is given 
10 this routing information and will pass it back to the SP unmodified when it sends data, status or a 
control frame. 

[95] The SP looks at the DST routing information field and programs the FCBPage register to 
Q route the frame. 

[96] The abbreviations for the fields in the routing information are as follows: 
tS [97] 1 . TB identifies the target blade. This is programmed into the Ingress FCBPage register. 

i J 5 

J [98] 2. QID is used at the target SP when filling up the Egress FCBPage QID field. 
yi [99] 3. FCAport is the FCASIC port identifier at the target SP. 

O [100] 4. DMU identifies the target DMU. This is programmed into the Ingress FCBPage 
it register. 

2|f [101] 5. DSU identifies the target DSU to use. This is programmed into the Ingress FCBPage 
|l| register. 

[102] When a command comes in at the upstream SP, the SRC routing information fields will 
be filled. The command is then shipped to the downstream SP. The downstream SP will fill in 
the DST routing information fields. Before shipping it to the FCASIC, the SRC and DST routing 
25 information fields are swapped. 

[103] When the FCASIC returns with data, control or status, the routing information is returned 
as is. The SP will look at the DST routing information. Since the SRC and DST were swapped 
at the previous step, the DST routing information now identifies the upstream SP. The frame can 
be routed directly to the upstream SP. 
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[104] At the upstream SP, fields from the DST routing information are used directly to send the 
frame out. Before sending the frame out, the SRC and DST routing information fields are 
swapped. 

[105] The storage processors operate according to programming code referred to as picocode. 
5 The following figures and associated description elaborate on the picocode and related 
embodiments of the invention. 

[106] FIG. 8 shows a Command Frame 300 encapsulated in PPP format. This is the general 
frame format employed by the Picocode. The FC frame that enters the SP is encapsulated within 
an Ethernet frame. The SP hardware classifier will look at the PROTOCOL field to determine 
10 which routine to call. The subType is used by microcode to further differentiate the POS frame 
type. 

[107] FIG. 9 shows the format of the TAGS field 302. The TAGS header 302 in the command 
p frame 300 is used to carry unique identifiers between SPs in order to get command context. The 
•ri TAGS field 302 includes the following data fields. 

l| [1 08] The FC handle 304 is the handle used by the FC ASIC to get its command handle. 
in [109] The SP qualifier 306 and SP handle 308 are interpreted as a single handle by the FC 
^ s ASIC. The SP handle 308 is used by the SP to get its command context. 
□ [110] Therouteinfbfield310issentbytheSPtotheFCASICinaRDY/ACKframe. TheFC 
\2 ASIC preferably sends the latest one. 
M [111] The Ctrl field 3 12 is a general-purpose field. 

jlj [112] The frameld 314 is a sequentially increasing number. Its use is like SEQ_CNT in a 
single sequence. 

[113] The port 3 1 6 identifies the port. 

[114] The plSzFillBst (pay load size/fill bits) 3 1 8 is inserted by FC ASIC. The field has 
25 different meaning depend on the frame type. For the receiving frame, it indicates the total byte 
count of the payload. For the sending frame, it indicates how many stuff bits filled in the last 
data word. 

[115] The relOffset 320 indicates the relative offset for the data payload. It is only valid in the 
receiving frame (SP's point of view). 
30 [116] The port handle 322 is used to identify a device the SP wants to talk to when it send a 
command descriptor down. 
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[117] The DevHandle 324 is used to store the device handle. It is used by FC ASIC to map the 
command to the specific device. Its usage is similar to SID and D_ID. 
[1 1 8] The rsvd field 326 is unused. 

[119] According to an embodiment of the present invention, there are 3 types of data frames: 
5 COMMAND, DATA and STATUS. The Type field in the Ethernet header is set to special 
defined codes to distinguish between the 3 types. The hardware classifier in the SP uses this 
field to call the correct entry point. 

[120] The SP picocode handles RD and WR command types for virtualized devices. Other 

commands are sent to the SP to handle. In the case of native devices, the SP picocode forwards 
10 all commands/frames to the device except reserve, release commands. Although the number of 

command types handled by picocode is small, the majority of the traffic is RD/WR commands. 
|«* [121] As described above with reference to FIG. 4, this document uses the terms upstream and 
m downstream. Upstream is used to describe the SP that is connected to the server. This is the SP 

that sees the command. Downstream is used to describe the SP that is connected to the target 
If I device. The upstream SP receives a command, processes it and sends it to the downstream SP. 
2 The downstream SP receives the command and sends it to the target device. 

* [122] The upstream SP does most of the processing of the command. The upstream SP 

. — £ 

m determines where to ship the command, calculates the new start logical block address (LB A) and 
P! requested blocks, and builds the new CDB and ships it to the downstream SP. The downstream 
25 SP takes care of the FC headers, decides which port to send the command frame out, and handles 
the sequence management to the target device. 
[123] Command frames upstream are handled as follows. 

[124] When a command frame enters the SP, the ingress command handler is called. 

Command frames are part of a new FC exchange. An IoCB structure gets allocated for the new 
25 command. The IoCB structure is used to keep track of the state of the new command. Sections 

of the command frame are saved into the IoCB in order to perform sequence management. 

Command frames are typically 82 bytes according to one embodiment. 

[125] The FCPM software module performs the IoCB allocation and saving of the command. 

It reads in more data from the I-DS since only 64 bytes are brought in. Once this is done, the 
30 processing of the command can begin. 
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[126] The SM software module is called next. This module determines whether an HLUN 
exists for the command. An HLUN is a structure that ties the server and a VLUN. The SM 
extracts the source port from the FCBpage structure, FCLUN, and DevHandle from the message 
header and feeds this to the tree search engine (TSE). If the TSE fails to yield a leaf, an HLUN 
5 does not exist and the command is rejected. If the HLUN exists, the processing of the command 
continues. This module also figures out the command type. In the case that it is not a RD/WR 
command, it sends the frame to the SP for processing. The SM extracts the starting LB A and 
number of blocks from the CDB and makes a call to the LM software component to figure out 
the physical start LB A, number of blocks and the physical destination of the command. 
10 [127] The LM is called with the HLUN search results in the TSR memory. From the HLUN, 

the LM looks for the VLUN. The physical devices that represent a VLUN may be several disks 
U that may not start at LBA 0. Each physical device behind the VLUN is referred to as a slice. 
% The LM goes through the slices, figures out which slice is called for in this 10 request, and 
H calculates the new starting LBA and requested blocks. A request may cross slice boundaries. In 
Ifi this case, the LM allocates child IoCBs and links them to the master request. After the 
| jf calculation is done, the LM searches for the target physical device. The LM fills in the FCBpage 

i J; » 

with the destination SP number, target DMU/DSU and fills in the Ethernet header with a 
||s plHandle used by the downstream SP to search for the target device. The LM returns back to the 
K SM the FCLUN of the physical target, the starting LBA and number of blocks. 
2§ [128] The SM from this information builds the FCP command payload and returns back to the 
] y FCPM. The FCPM writes back the frame from the datapool and enqueues the frame to the 

switch module, ending the command processing on the upstream SP. 

[129] Command frames downstream are handled as follows. 

[130] The command frame from the upstream SP gets sent downstream. In the Ethernet 
25 encapsulation header, the LLC field contains 2 pieces of important information: Upstream 
Handle and pHandle. 

[131] The FCPM module after receiving the command frame allocates an IoCB for the 
command. The upstream handle is used when the downstream SP needs to send the upstream SP 
data or status frames related to that command. The upstream handle is sent together with the 
30 data or status frames so that the upstream SP will be able to find context to the frame. After 
receiving the command, the downstream SP sends the upstream SP an ACK/RDY frame 
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containing the downstream handle. This way both upstream and downstream SPs will have each 
other's handle. The peer handle is sent so that each side can get the frame's command context. 
[132] The pHandle passed in can either be a PLUN or a pDevpath lookup. This depends on the 
pHandle MCAST bit. If the MCAST bit is set, this means that there may be multiple paths to the 
5 device. If clear, there is only a single path. If the PLUN is looked up, from the leaf, the LM will 
decide which path to take. This leads to another search for the pDevpath. With a single path, the 
pDevpath is looked up directly. The LM extracts maxRxData size and target port. This 
information is returned to the FCPM. The FCPM constructs a new FC header from the LM 
returned information and ships the frame out. 
1 0 [133] Data frames upstream are handled in the following manner. 

[134] Data frames upstream can happen in a number of circumstances. For example, when the 
server is sending write data, the data frames appear on the ingress side. Where the downstream 

□ SP is responding with read data, the data frames appear on the egress side. 

J; [135] In the case of the server sending data, the FCPM looks for the IoCB using the returned 
15=1 SPHandle (IOCB address) in the message header. From the IoCB, the SP knows where to ship 
.|j the data frame. 

i;l 1 [136] In the case where data frames are on the egress side, the FCPM looks for the IoCB using 

□ the handle passed in through the Ethernet LLC field. This is the IoCB address. From the IoCB, 
{ Y the FCPM decides whether to ship the data to the server or whether it must wait for more data. 

W This may occur in the case of striping, where data may come in out of order, 
ry [137] Data frames downstream are handled in the following manner. 

[138] Data frames downstream can happen in a number of circumstances. For example, when 
the device is responding with read data, these frames appear at the ingress side. When the 
upstream SP is sending write data, these frames appear at the egress side. The way the SP looks 

25 for the IoCB is the same as the explained above. 

[139] Status frames downstream are handled in the following manner. 
[140] Status frames downstream come from the target device. This happens when the 
requested operation completes or an exception has occurred. The status data comes together 
with the status frame. 

30 [141] The FCPM will look for the IoCB using the returned SPHandle (IOCB address) in the 
message header. The upstream command handle is inserted into the Ethernet encapsulation 
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header. The status frame is shipped upstream and the IoCB is de-allocated. If there are multiple 

paths to the device, other paths may be attempted. 

[142] Status frames upstream are handled in the following manner. 

[143] Status frames upstream come from the downstream SP. The FCPM looks for the IoCB 
5 from the command handle passed in the Ethernet encapsulation header. The SM subsystem is 
called and a status frame is generated if necessary. The status frame is for a request on a virtual 
device. The return status is from a physical device, and may not have the same context. Hence, 
the SM may regenerate the status frame. The FCPM is finally called to transmit the frame back 
to the server. After this happens, the IoCB is deallocated. 
10 [144] FIG. 10 is a flow diagram of the processing steps in a read command. A read dataflow 
goes through the same command, data and status phases described above. When a read 
command is received, the SP decomposes the virtual request into the physical requests. 
□ [145] The FCP protocol assumes that all the buffer space required for the command has already 
;J been allocated on the server side. The SP is free to send data back to the server without waiting 
Ig for XFERRDY. The flow control is handled by the FC port ASIC using BB credit mechanism 
j% on the FC side and PAUSE frames on the GbE side. 

: * : [146] In the simple case where the request maps to a single physical device, no reordering is 
0 necessary. The path to the device is picked and the physical request is sent to the SP attached to 
i2 the device. In the case of mirroring, the upstream SP decides which member to read from. In the 
W case of a concatenation or stripe, the SP may generate additional requests from the original one. 
m As data comes back from the downstream SPs, the upstream SP reassembles the data in order 
before sending it back to the server after NPSIM release. 

[147] FIG. 11 is a flow diagram of the processing steps in a write command. When a write 
command is received from the server, the SP figures out the path to the physical device. The 
25 write command is more complicated since an XFERRDY frame should be sent to the server. 
[148] The upstream SP preferably will not send an XFER RDY to the server until it gets the 
handle back from the downstream SP. The downstream SP preferably will not send the handle 
back until it gets a XFER RDY response back from the device. 

[149] The upstream SP then sends an XFER RDY to the server with the same byte count 
30 indicated from the downstream SP, starting data flow from the server. 
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[1 50] The XFERRDY is further adjusted to ensure that a data frame does not have to be split 
across disk boundaries. This modification is done by the upstream SP, and the process continues 
until the request is complete. 

[151] As an optimization, the downstream SP can respond to the upstream SP with a fabricated 
5 XFER RDY. The byte count reported is set to the maximum receive size the device can receive, 
which is negotiated during the PLOGI process. The upstream SP sends an XFERRFY to the 
server. This starts data flowing from the server. 

[152] When the target device responds with the XFER RDY, the downstream SP sends an 
adjusted byte count back to the upstream SP. The upstream SP sends an XFER RDY to the 
1 0 server with the new adjusted byte count value. This method is an optimization and will be 
considered later. 

U [153] FIG. 12 shows the Picocode software stacks. The messaging layer works to interpret the 
S Ethernet encapsulation, working with the Hardware Classifier to call the correct input functions. 
N The Fibre Channel protocol (FCP) manager keeps track of sequence management and IoCB 
m allocation/de-allocation. The SCSI manager (SM) layer interprets the SCSI commands inside the 
SJ FCP layer. The LUN Manager (LM) layer takes the virtual request that comes in from the server 
and decomposes it to the physical request. The utility layer has functions to allocate/de-allocate 
m IoCB's. 

h! [1 54] This section describes the FCP Manager component in the picocode subsystem (see FIG. 

2D 12). In the Picocode software stack, the FCP may be considered as the front-end of the picocode 

' y subsystem. In the other words, FCP Manager is set up to intercept all incoming frames. The SP 
Hardware Classifier (HC) is configured to dispatch the incoming frame to an appropriated frame 
handler base on the dispatch indicator discussed in the SP HC configuration section. 
[155] Different frame handlers perform different set of actions to fulfill the task. The course of 

25 actions that are performed by those handlers is discussed in the FCP Public Interface section. 

[156] The SP Hardware Classifier Configuration is as follows. Depending on the side of arrival 
of SP of the incoming frame, SP Hardware Classifier (HC) keys on different fields to dispatch a 
frame. On the Ingress side, HC bases on E-type to dispatch an incoming frame. Yet, it bases on 
UC/MC, FHEF, and VSHF to dispatch an Egress incoming frame. 

30 [157] The FC ASIC and SP communicate via Command Descriptor (CD) frames 300 as shown 
in FIG. 8. While there are certain requirements that both FC ASIC and SP need to ensure in 
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preparing the CD frame header (which includes the fields ADDR, CTRL, PROTOCOL and 
TAGS 302 in FIG. 8), this section summarizes the CD frame header manipulation on two main 
IO paths, Read and Write. 

[158] FIG. 13 depicts the CD frame header manipulation on a Read command. FIG. 14 depicts 
5 the CD frame header manipulation on a Write command. 

[159] The FCP Public Interfaces as part of the picocode (see FIG. 12) include the following: 
Ingress Command Handler, Egress Command Handler, Ingress Data Handler, Egress Data 
Handler, Ingress Status Handler, Egress Status Handler, Ingress Xfer Ready Handler, Egress 
Xfer Ready Handler, Ingress Send Command Handler, Egress Send Good Status Handler, Egress 
10 Send Bad Status Handler, Egress Send New Status Handler, and Discard I-Frame Handler, 

[160] The Ingress Command Handler function is used to handle command frame sent from the 

is?!*! 

;=* server to SP-frigress. The entry point is fcp_cmd__i, and the path is 

0- UPSTREAM - INGRESS - COMMAND. The hardware classifier prompts to this function base 
y on the VSX programmable E4ype (CMD-I). Input includes the portion of command Frame in 
1 1; Data Pool (64 bytes). 

m [161] Functions of the Ingress Command Handler include allocating IOCB from IOCB _pool 
q (util_iocb_alloc), with allocated IOCB's ID be passed in w20, and reading the IOCB content into 
| U- ScratchMeml (4QWs). The Ingress Command Handler also checks the frame ID of the 
Ly incoming frame, initializes the expected inbound, outbound, internal frame IDs, and extracts the 
iWi essential information from FC frame and store in IOCB (SP; FC_Handle). 

[162] Other functions of the Ingress Command Handler include setting SP_Handle and 
SP_Qualifier into the frame tag, copying the FC command frame to the staging area of IOCB, 
storing the identification of the second I-DS buffer that contains IOCB information to w22, and 
filling the own handle into the command frame. 
25 [163] Further functions of the Ingress Command Handler include storing the IOCB and staging 
area address into w28 and w30 respectively, and calling SCSI Manager to process the SCSI 
command (sm_cmd_i). The IOCB image in ScratchMeml is updated but not the real IOCB's 
content. (The updated information should be flushed out after returning from the function.) 
[164] The passing information includes Command Frame in Data Pool (96 bytes), the IOCB 
30 address in w28, the IOCB staging area address in w30, the content of IOCB in ScratchMeml (64 
bytes - 4 QWs). The Ingress Command Handler may then exit. 
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[165] The Egress Command Handler function is used to handle command frame sent from the 
Initiator-Rainier to SP-Egress. The entry point is fcp_cmd_e, and the path is 
DOWNSTREAM - EGRESS - COMMAND. 

[166] The Hardware Classifier prompts to this function base on {iUCnMC, FHE, and FHF} . 
5 Inputs include the portion of command Frame in Data Pool (64 bytes), and RO contains the offset 
to the first byte of the frame header. 

[1 67] Due to buffer size mismatch, we will not send "local-command-handle" back to Initiator 
Rainier until XFR RDY being received from target device. 

[168] Functions of the Egress Command Handler include validating the incoming E_frame 
1 0 (fcp_filter_fcp_efrm), and allocating IOCB from IOCB_pool (util_iocb_alloc), with the 
IOCB_ Alloc function ensuring that allocated IOCB's ID are in w20. 
u [169] In addition, the Egress Command Handler may read the IOCB content into ScratchMeml 
Q (4QWs), store IOCB and stage area address into w28 and w30 respectively, and check the frame 
tj ID of the incoming frame. Other functions of the Egress Command Handler include initializing 
1 jjj the expected inbound, outbound, internal frame Ids, saving the peer_Handle and peer_Qualifier 
>£j into IOCB, and initializing the FCJHandle to be OxFFFF and zero out the control field. 
!f " [170] The Egress Command Handler will also call LM to perform pLun lookup (lm_cmd_e). 
5 The lm_cmd_e function will ensure the the target port in IOCB and the MaxRxData in IOCB. 
£ [171] The Egress Command handler will further call FCP to send command frame to target 
M device (fcp_snd_cmd_e), and enqueue the IOCB into port active queue. The Egress Command 
IU Handler may then flush the updated IOCB information from ScratchMeml to E-DS, and then 
exit. 

[172] The Ingress Data Handler function is used to handle data frame sent from the server to 
SP-Ingress. The entry point is fcp_data_i, and the path is 
25 UPSTREAM/DOWNSTREAM - INGRESS - DATA. The Hardware Classifier prompts to this 
function base on the VSX programmable E-type (DATA-I). Inputs include a portion of 
command Frame in Data Pool (32 bytes) 

[1 73] Functions of the Ingress Data Handler include validating the IOCB address (the returned 
SP_Handle in the received frame), and reading the IOCB content into ScratchMeml (8 QWs). 
30 The Ingress Data Handler may match the frame content with IOCB content by checking the 
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following fields: SP_Qualifier, FrameJD, FC_Handle (if it would not be the first data frame), 
and Routing information (fcp_val_ri_i). 

[174] The Ingress Data Handler may also save FC_Handle into IOCB (on the first data frame), 

update the frame ID, peer_Handle, and peer_Qualifier, and call FCP to send data frame to the 
i other Rainier (Initiator/Target-Rainier) (fcp_snd_data_i). The Ingress Data Handler may further 

flush the updated IOCB information from ScratchMeml to E-DS, and then Exit. 

[175] The Egress Data Handler function is used to handle data frame sent from the 

Initiator/Target-Rainier to SP-Egress. The entry point is fcp_data_e, and the path is 

UPSTREAM/DOWNSTREAM - EGRESS - DATA. 
) [176] The Hardware Classifier prompts to this function base on {iUCnMC, FHE, and FHF). 

Inputs include a portion of data Frame in Data Pool (32 bytes). 

[177] Functions performed by the Egress Data Handler include validating the IOCB address 
3 (the passing peerOrPHandle in the received frame), and reading the IOCB content into 
H ScratchMeml (8 QWs). The Egress Data Handler may also match the frame content with IOCB 
¥ content by checking the following fields: Own_Qualifier, FrameJD, Peer _Handle and 
J peer_Qualifier (if it would not be the first data frame), and Routing information (fcp_val _ri_e) 
y 1 (if it would not be the first data frame). 

□ [178] The Egress Data Handler may also save peer_Handle, peer_Qualifier, and the completed 

u routing information into IOCB (on the first data frame), swap the source and destination routing 
information, and update the FC_Handle, SP_Handle, SP_Qualifier, frame JD, portHandle, 

m port_Number, and the frame control field. 

[1 79] The Egress Data Handler may call FCP to send data frame to the destination device 
(Initiator/Target device) (fcp_snd_data_e), update the running-byte-count field in IOCB, flush 
the updated IOCB information from ScratchMeml to E-DS, and then exits. 

:5 [1 80] The Ingress Status Hander function is used to handle status frame sent from the target 
device to SP-Ingress. The entry point is fcp_status_i, and the path is 

DOWNSTREAM - INGRESS - STATUS. The Hardware Classifier prompts to this function 
base on the VSX programmable E-type (STS-I). Inputs include a portion of command Frame in 
Data Pool (64 bytes). 

10 [181] Functions performed by the Ingress Status Handler include validating the IOCB address 
(the returned SP_Handle in the received frame), and reading the IOCB content into 
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ScratchMeml (8 QWs). Frame content is matched with IOCB content by checking the following 
fields: SP_Qualifier, FrameJD, FCJrlandle (if it would not be the first frame), and routing 
information (fcp_val_ri_i). 

[182] The Ingress Status Handler further saves FC_Handle into IOCB (on the first frame), 
5 updates the frame ID, peer Handle, and peer Qualifier, and calls FCP to send status frame to the 
Initiator-Rainier (fcp_snd_sts_i). The Ingress Status Handler also deallocates the IOCB from 
port active queue (util_remove_this), returns the IOCB to free IOCB-Pool, and then exits. 
[183] Th e Egress Status Handler function is used to handle status frame sent from the 
Initiator-Rainier to Host. The entry point is fcp status e, and the path is 
1 0 UPSTREAM - EGRESS - STATUS. 

[184] The Hardware Classifier prompts to this function base on {iUCnMC, FHE, and FHF). 
Input includes a portion of command Frame in Data Pool (32 bytes). It is assumed that SM is 
Q responsible for building the status payload. 
. [185] Functions of the Egress Status Handler include validating the IOCB address (the passing 
lM peerOrPHandle in the received frame), and reading the IOCB content into ScratchMeml (8 
i QWs). The frame content is matched with IOCB content by checking the following fields 
y 5 own_Qualifier, FrameJD, PeerJHandle and peer_Qualifier (if it would not be the first frame), 
5 and Routing information (fcpjyaljri_e) (if it would not be the first frame). 
\1 [186] Other functions of the Egress Status Handler include saving peerJHandle, peer_Qualifier, 
2j±; and the completed routing information into IOCB (on the first frame). Other functions include 
I y- swapping the source and destination routing information, storing the IOCB address in w28, and 
calling SCSI Management to log the 10 status (sm_status__e). 

[187] Passing information includes Data Frame in Data Pool (32 bytes), the IOCB address in 
w28, and the Content of IOCB in ScratchMeml (8 QWs). The Egress Status Handler then exits. 
25 [188] The Ingress Xfer Ready Handler function is used to handle xferRdy frame sent from the 
target device to SP-Ingress. The entry point is fcp_xfr_rdyj, and the path is 
DOWNSTREAM - INGRESS - XFERJREADY. This function is used to handle xferRdy frame 
sent from the target device to SP-Ingress. 

[189] The Hardware Classifier prompts to this function base on the VSX programmable E-type 
30 (XFRRDY-I). Input includes a portion of command Frame in Data Pool (64 bytes). 
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[190] The Ingress Xfer Ready Handler functions to validate the IOCB address (the returned 
SPJHandle in the received frame), and to read the IOCB content into ScratchMeml (8 QWs). 
The frame content is matched with IOCB content by checking the following fields: SP_Qualifier, 
FrameJD, FCJHandle (if it would not be the first data frame), and routing information 
5 (fcp_val_ri_i). 

[191] The Ingress Xfer Ready Handler further confirms that Data_RO (in XfrRdy payload) is 
the same as IOCB.running-byte-cnt; otherwise calling FCP error handler (fcp_mvalid_xfrrdy). 
The Ingress Xfer Ready Handler also saves FCJHandle into IOCB (on the first data frame), 
updates the IOCB.xfrrdy with the BURSTJLEN (in XfrRdy payload), and updates the frame ID, 

10 peer_Handle, and peer_Qualifier. The Ingress Xfer Ready Handler calls FCP to send xferRdy 
frame to the other Rainier (Initiator -Rainier) (fcp__snd_xfr jrdy_i), flushes the updated IOCB 
information from ScratchMeml to E-DS, and exits. 
O [192] The Egress Xfer Ready Handler function is used to handle xferRdy frame sent from the 
Q target Rainier to SP-Ingress. The entry point is fcp_xfr_rdy_e, and the path is 

l$j UPSTREAM - EGRESS - XFERJREADY. The Hardware Classifier prompts to this function 

fl base on {iUCnMC, FHE, and FHF). Input includes a portion of data Frame in Data Pool (32 

m. 

~ bytes). 

Q [193] Functions performed by the Egress Xfer Ready Handler include validating the IOCB 
u address (the passing peerOrPHandle in the received frame), and reading the IOCB content into 

2$J ScratchMeml (8 QWs). Frame content is matched with IOCB content by checking the following 

!V fields Own_Qualifier, FrameJD, PeerJHandle and peer_Qualifier (if it would not be the first 
data frame), and Routing information (fcp_val_ri_e) (if it would not be the first data frame). 
[194] The Egress Xfer Ready Handler also saves peerJHandle, peer_Qualifier, and the source 
routing information into IOCB (on the first data frame), swaps the source and destination routing 

25 information and confirms that DataRO (in XfrRdy payload) is the same as 

IOCB.running-byte-cnt; otherwise call FCP error handler (fcpjnvalid_xfrrdy). The Egress Xfer 
Ready Handler updates the IOCB.xfrrdy with the BURSTJLEN (in XfrRdy payload), and 
updates the FCJHandle, SP_Handle, SP_Qualifier, frame JOD, port Handle, port JSTumber, and 
the frame control field. The Egress Xfer Ready Handler calls FCP to send xferRdy frame to the 

30 initiator device (fcp_snd_xfr_rdy_e), flushes the updated IOCB information from ScratchMeml 
to E-DS, and exits. 
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[ 1 951 The Ingress Send Command Handler function is used to flush the updated frame and 
IOCB's content to I-DS and E-DS respectively it will then en-queue the frame to I-EDS. The 
entry point is fcp_snd_cmd_i, and the path is UPSTREAM - INGRESS - COMMAND. The 
caller is SM. 

5 [1 96] Inputs comprise the frame's content in Datapool (the amount of QWs that contain the 
valid information should be predefined and ensured), the updated IOCB's content in 
ScratchMeml (the amount of QWs that contain the valid information should be predefined and 
ensured), and that E-type, command payload, destination information have been built. Other 
inputs include IOCB.TBO is stored in w28, and the identification of the second I-DS buffer that 
1 0 contains command frame is stored in w22. 

[1 97] Functions performed by the Ingress Send Command Handler include flushing the updated 
H= information from Data Pool to I-DS, and sending the frame to the Target setting up the 
S FCBPage {iUCMC, FHF, FHE} and enqueuing the frame to I-EDS. The Ingress Send 
'S Command Handler also flushes the updated IOCB information from ScratchMeml to E-DS. 
lift 1198] The Egress Send Good Status Handler function is used to flush the updated frame to 
I E-DS and enqueue the frame to E-EDS. The entry point is fcp_snd_gdsts_e, and the path is 

UPSTREAM - EGRESS - STATUS. The caller is SM. Inputs comprise the frame's content in 
!fl Data Pool, and that status payload and destination information have been built, 
h [1 99] Functions of the Egress Send Good Status Handler include modifying the FC-frame 
J (FCJiandle, SPJHandle, SP_Qualifier, frameJD, FC JPortJiandle, and Port Number). The 
1 y function does not need to swap the routing information because fcp_val_ri_eO has done it 

already. The Egress Send Good Status Handler also flushes the updated information from Data 
Pool to E-DS (3QWs starting from the second QW in Data Pool). The frame is sent to the 
Initiator/Host to set up the FCBPage {QID}, and to enqueue the frame to E-EDS. 
25 [200] The Egress Send Bad Status Handler function is used to flush the updated frame to E-DS 
and enqueue the frame to E-EDS. The entry point is fcp_snd_badsts_e, and the path is 
UPSTREAM - EGRESS - STATUS. The caller is SM. Inputs include the frame's content in 
Data Pool, that status payload and destination information have been built, and the size of 
response payload in bytes is passed through w20. 
30 [201] Functions of the Egress Send Bad Status Handler include modifying the FC-frame 

(FC_Handle, SPJHandle, SP_Qualifier, frame_ID, FC_Port_Handle, and Port Number). The 
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function does not need to swap the routing information because fcp_val_ri_e() has done it 
already.) The updated information is flushed from Data Pool to E-DS (base on the size of 
response payload passed by SM to calculate the number of QWs that need to be flushed from 
Data Pool to E-DS). The frame is sent to the Initiator/Host to set up the FCBPage {QID}, and to 
5 enqueue the frame to E-EDS. 

[202] The Egress Send New Status Handler function is used to build a new status frame and 
send to the host. The Entry Point is fcp_snd_new_sts_e, and the path is UPSTREAM - EGRESS 
-STATUS. The caller is SM. 

[203] Inputs include the frame's content in Datapool (the amount of QWs that contain the valid 
1 0 information should be predefined and ensured), that status payload and destination information 

have been built, and the size of response payload in bytes is passed through w20. 
M [204] Functions of the Egress Send New Status Handler include modifying the FC-frame 
|| (POS-Header, Protocol, FC _Handle, SP_Handle, SP_Qualifier, frameJD, FC _Port_Handle, and 
M Port Number), setting up the control information, and setting the POS trailer. Other functions 
m include allocating a new twin buffer to store the status frame content, building a new FCBPage 
% with the essential information, and flushing the updated information from Data Pool to E-DS 

iJp Si" 

(base on the size of response payload passed by SM to calculate the number of QWs that need to 
ill be flushed from Data Pool to E-DS). The frame is sent to the Initiator/Host to set up the 

FCBPage {QID} , and to enqueue the frame to E-EDS. 
M [205] The Discard I-Frame Handler function is used to discard the Ingress incoming frame. 
1 11 The entry point is fcp_discard_i, and the path is XXX - INGRESS - XXX. The caller is SM. It 

is assumed that the discarded frame information is stored in the active FCBPage. The function 

of the Discard I-Frame Handler is to enqueue the frame to ingress discard queue (i.e., I-DDQ). 

[206] The Discard E-Frame Handler function is used to discard the Egress incoming frame. 
25 The entry point is fcp_discard_e, and the path is XXX - EGRESS - XXX. The caller is FCP. It 

is assumed that the discarded frame information is stored in the active FCBPage. The Discard 

E-Frame hander functions to enqueue the frame to ingress egress discard queue (i.e., E-DDQ). 

[207] The following is a list of FCP private interfaces performed by the picocode (see FIG. 12): 

Egress Send Command Handler, Ingress Send Data Handler, Egress Send Data Handler, Ingress 
30 Send Status Handler, Ingress Send Transfer Ready Handler, Egress Send Transfer Ready 

Handler, Ingress Send Handle Response, Egress Filter FC-frame, Egress Invalid Check Sum, 
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Ingress Validate Frame Routing Information, Egress Validate Frame Routing Information, 
Ingress Invalid FC Frame Information, Egress Invalid FC Frame Information, and Discard E- 
Frame Handler. 

[208] The Egress Send Command Handler Entry Point function is used to flush the updated 
5 frame to E-DS and enqueue the frame to E-EDS. The entry point is fcp_snd_cmd_e, and the 
path is DOWNSTREAM - EGRESS - COMMAND. The caller is FCP. 
[209] Inputs include the frame's content in Data Pool, and that command payload and 
destination information have been built. It is assumed that LM is responsible for preparing the 
FCB-Page, the frame routing information, and the port handle. 
10 [210] Functions of the Egress Send Command Handler Entry Point include swapping the source 
and destination routing information, setting the port number in the outgoing frame, and flushing 
the updated information from Data Pool to E-DS. The frame is sent to the Target Device to set 
Q up the FCBPage {QID} , and to enqueue the frame to E-EDS. 

irt [211] The Ingress Send Data Handler function is used to flush the updated frame to I-DS and 
1 iN enqueue the frame to I-EDS. The entry point is fcp_snd_data_i, and the entry path is 
Jj UPSTREAM/DOWNSTREAM -INGRESS -DATA. The caller is FCP. Inputs include the 
l * 1 frame's content in Datapool, and that data payload and destination information have been built. 
□ [212] Functions of the Ingress Send Data Handler include flushing the updated information 
j 2 from Data Pool to I-DS, and sending the frame to the other-Rainier to set up the FCBPage 
2|) {iUCMC, FHF, FHE, TB, TDMU, iDSU}, and to enqueue the frame to I-EDS. 
ilj- [213] The Egress Send Data Handler function is used to flush the updated frame to E-DS and 
enqueue the frame to E-EDS. The entry point is fcp_snd_data_e, and the path is 
UPSTREAM/DOWNSTREAM - EGRESS - DATA. The caller is FCP. 
[214] Inputs comprise the frame's content in Datapool, and that data payload and destination 
25 information have been built. Functions of the Egress Send Data Handler include flushing the 
updated information from Data Pool to E-DS, and sending the frame to the Initiator-Rainier to 
set up the FCBPage {QID}, and to enqueue the frame to E-EDS. 

[215] The Ingress Send Status Handler function is used to flush the updated frame to I-DS and 
enqueue the frame to I-EDS. The Entry Point is fcp_snd_stsj, and the path is 
30 DOWNSTREAM - INGRESS - STATUS. The caller is FCP. 
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[216] Inputs include the frame's content in Datapool, and that status payload and destination 
information have been built. The Ingress Send Status Handler Function is used to flush the 
updated information from Data Pool to I-DS, and to send the frame to the Initiator-Rainier, 
setting up the FCBPage { iUCMC, FHF, FHE, TB, TDMU, iDSU}, and enqueuing the frame to 
5 I-EDS. 

[2 1 7] The Ingress Send Transfer Ready Handler function is used to flush the updated frame to 
I-DS and enqueue the frame to I-EDS. The Entry Point is fcp_snd_xfr_rdy_i, and the path is 
DOWNSTREAM - INGRESS - XFR READY. The caller is FCP, and the input is the frame's 
content in Datapool. The Ingress Send Transfer Ready Handler functions to flush the updated 
1 0 information from Data Pool to I-DS, and to send the frame to the Initiator-Rainier to Set up the 
FCBPage { iUCMC, FHF, FHE, TB, TDMU, iDSU}, and to enqueue the frame to I-EDS. 
[218] The Egress Send Transfer Ready Handler function is used to flush the updated frame to 
□ E-DS and enqueue the frame to E-EDS. The Entry Point is fcp_snd_xfr_rdy_e, and the path is 
3 UPSTREAM -EGRESS -XFR READY. The caller is FCP. The input is the frame's content in 

I j| Datapool. Functions of the Egress Send Transfer Ready Handler include flushing the updated 
5 information from Data Pool to E-DS, and sending the frame to the Initiator-Rainier to set up the 
U 1 FCBPage {QTD} , and to enqueue the frame to E-EDS 

0 [219] The Ingress Send Handle Response function is used by Target Rainier to pass the 
;I command handle back to Initiator Rainier. The Entry Point is fcp_snd_hndl_resp_i, and the path 
ijj is DOWNSTREAM - INGRESS - COMMAND. The caller is FCP. The input is the frame's 

: SSR 

II content in Datapool (6 words). 

[220] Functions of the Ingress Send Handle Response include leasing the I-DS buffer, building 
the handle response frame, and sending the frame to the Initiator Rainier to set up the FCBPage2 
{ iUCMC, FHF, FHE, TB, TDMU, WBC}, and to enqueue the frame to I-EDS. 

25 [221] The Egress Filter FC-frame function is used to validate the egress-incoming frame. The 
entry point is fcp_filter_fcp_efrm, and the path is XXX - EGRESS - XXX. The caller is FCP. 
The input is the frame's content in Data Pool (6QWs for command frame/4QWs for others). 
Functions of the Egress Filter FC frame include performing check SUM, and return to caller if 
everything would be Okay; otherwise, invoke error event handler (fcpmvalCheckSumEfrm). 

30 [222] The Egress Invalid Check Sum function is used to handle check Sum error on any egress 
frame. The entry point is fcpInvalCheckSumEFrm. The caller is FCP. Functions of the Egress 
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Invalid Check Sum include logging errors and discarding the frame (i.e., queuing the frame to 
E-DDQ). 

[223] The Ingress Validate Frame Routing Information function is used to validate the frame 
routing information. The entry point is fcp_val_ri_i. The caller is FCP. Inputs include the 
5 frame's content in Datapool, and IOCB's content in ScratchMeml . Functions include comparing 
the routing information within the IOCB and the incoming frame, and invoking cpInvallFrmlnfo 
to handle the error if there would be a mismatch. 

[224] The Egress Validate Frame Routing Information function is used to validate the frame 
routing information. The entry point is fcp_val_ri_e. The caller is FCP. Inputs include the 

10 frame's content in Datapool, and the IOCB's content in ScratchMeml . Functions include 
comparing the routing information within the IOCB and the incoming frame, and invoking 

u fcpInvalEFrmlnfo to handle the error if there would be a mismatch; otherwise, swapping the 

2 frame's routing information. 

\j [225] The Ingress Invalid FC Frame Information function is used to handle mismatched 

11 information between IOCB and frame content. The Entry Point is fcpInvaUFrmlnfo, and the 

l D caller is FCP. Functions include logging errors, and discarding the frame (i.e., queuing the frame 

i"P 

: toI-DDQ). 

3 [226] The Egress Invalid FC Frame Information function is used to handle mismatched 

h* information between IOCB and frame content. The entry point is fcpInvalEFrmlnfo, and the 
2§ caller is FCP. Functions include logging errors and discarding the frame (i.e., queuing the frame 
IV' to E-DDQ). 

[227] The Discard E-Frame Handler function is used to discard the Egress incoming frame. Its 

functions include enqueueing the frame to the ingress discard queue (i.e., E-DDQ). 

[228] This section describes the SCSI Manager component (SM) in the picocode subsystem 

25 (see FIG. 1 2). The main responsibility of SM is to process the SCSI specific information from 
the frames. On each command frame that comes in from the server, SM determines whether a 
HLUN exists. It uses FC-LUN, DevHandle, and the entry port in the SP to build the key and 
send it to the tree search engine. If the search is successful, it then passes the result to LM 
together with the start LBA and number of blocks. Otherwise it will try to either reject the 

30 command or send it to the SP to handle the command. 
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[229] The LM will pick the path, physical target, LBA and pass them back. The SM then will 

modify the LBA in the CDB and send the command to the FCP to send it to the target SP. 

[230] The SM uses an Opcode Classifier Table to decide on how to act on a SCSI command. 

The Opcode Classifier Table is an array of 256 elements that are allocated from Control Store 
5 memory. Each element contains a number of flags. 

[231] These flags are as follows. Is-Read-Opcode, when set, identifies the opcode is a read (i.e. 

Read 10). Is-Write-Opcode, when set, identifies the opcode is a write (i.e. Write 10). 

Is-Reserve-Opcode, when set, identifies the opcode is a reservation (i.e. Reserve 6). 

Is-Release-Opcode, when set, identifies the opcode is a release (i.e. Release 6). 
10 Opcode-Is- Allowed-Without-HLUN, when set, identifies the opcode is allowed whether the 

LUN exists or not (i.e. Report LUNS). Opcode-Is- Allowed-With-UA-Set, when set, identifies 
M the opcode is allowed when the Unit Attention condition on the LUN is set (i.e. Inquiry). 

.5a, 

S Opcode-Is-Not- Affected-By-Reservations, when set, identifies the opcode is not affected by the 
reservation conflict (i.e. Read Block Limits). 

111 [232] The flags in each element are initialized according to its position in the table. SM uses 

M the SCSI opcode from the command frame to index into this table. Based on the flags from the 
table, SM can decide which code path to take. When look up for the opcode classifier, the 

iij following formula is used: 

[233] Classifier address = Classifier-Table-Address + SCSI-Opcode 

21 [234] The SM features a number of public interfaces. One is the Upstream Ingress Command 
(E-Type=CMD-I). This entry point handles command frame that comes in from the server 
through the ingress side. The entry point is Sm_cmd_i, and is called by FCP. This public 
interface expects the Command Frame in Data Pool (96 bytes), the IOCB address in w28, the 
IOCB staging area address in w30, and the IOCB in Scratch 1. The public interface also expects 

25 own handle to be filled in the command frame (upstream handle), and own and peer FC-ID to be 
saved in IOCB. 

[235] Steps taken by the Upstream Ingress Command include starting the search for hlun 
(DevHandle, FCLUN, Port), and translating the information from command frame and save them 
to the IOCB if needed, including LBA, the number of blocks, the total byte count, and the data 
30 direction. Other steps include initializing the running byte count, getting the search result 
(expected in TSR0), calling sm_no_hlun_handler if hlun does not exist, and calling 
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sm_no_rdwr_handler if the opcode is not read or write. Values passed to LM include Vlun LB A 
in w24, the number of blocks in w26, that the R20 0=Command is not a Read/Write and the 
search result in TSRO. 

[236] Another step taken includes calling lm_cmd_i. Expected return values from LM include 
5 Plun LB A in w24, the number of blocks in w26, the status code in r20, the native device flag in 
rl8 (zero=native device), the Target Device FC-LUN in r21, the Target Blade filled in FCB page 
and IOCB, and the PlunHandle filled in the command frame. 

[237] If not a Native device, the LB A and the number of blocks in the CDB (data pool 
memory) are modified. Other steps include filling in the target device FC-LUN in the command 

10 frame, setting the e-type to CMD-E, enqueing the IOCB to the port active queue, and calling 

fcp_snd_cmd_i to send the command to the target SP. FCP will update the data-pool to I-DS and 

!■* scratch 1 to CS. 

q [238] Another public interface is the Upstream Egress Status (E-Type-Stat-E), which handles 
y. the status frame from a target device that comes in from the egress side via the downstream SP. 
W The entry point is Sm_status_e, and the caller is FCP. 

S [239] This interface expects the FC Response frame in Data Pool (64 bytes), the IOCB address 
in w28, and the IOCB in Scratch 1 . Steps taken include call fcp_discard_e and returning if the 

!l] status is not from the last child, modifying the response code in data pool as needed, dequeue the 

!^ IOCB from the port active queue, calling fcp_snd__sts_e to send the status frame to the server, 

2D and returning the IOCB to the free pool. 

[240] The following public interfaces do not involve the SM: Downstream Egress Command 
(E-Type=CMD-E), Downstream Ingress Data (E-Type=Data-I), Upstream Egress Data 
(E-Type=Data-E), and Downstream Ingress Read Status (E-Type=Stat-I). 
[241] The SCSI Manager has two internal interfaces: sm_noJilun_handler and 

25 sm_no_rdwr_handler. 

[242] The sm_no Jilun Jiandler entry point handles a command frame that targets to a non- 
existent hlun, and it is called by SM. This interface expects the Command Frame in Data Pool 
(96 bytes), the IOCB address in w28, the IOCB staging area address in w30, and the IOCB in 
Scratch 1. 

30 [243] Steps taken include calling sm_nojrdwr_handler if the opcode needs to be handled by 
E405 (i.e. inquiry, report LUNs ), calling fcp_discard_i To discard the I-DS buffer, building the 
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status payload in data pool, and calling fcp_snd_new_sts_e. Notes FCP will allocate new twin 
buffer and build new FCB page and send the frame to the server. 

[244] The smnordwrhandler entry point handles command frame other than read or write, 

and is called by SM. This interface expects the Command Frame in Data Pool (96 bytes), the 
5 IOCB address in w28, the IOCB staging area address in w30, and the IOCB in Scratch 1 . 

[245] Steps taken include calling fcp_discard_i to discard the I-DS buffer, enqueuing the IOCB 

to the port active queue, and sending to the SP to handle the command. 

[246] This section describes the Lun Manager component in the picocode subsystem (see FIG. 

12). The LM subsystem is in charge of decomposing a virtual request into physical ones. The 
1 0 LM subsystem looks at the starting LB A and number of blocks in a request from a server, and 

determines whether the device a native device or a virtual device. 

[247] The LM subsystem also identifies the start LB A and number of blocks of the physical 

□ request, and decomposes the virtual request into several physical IO's as needed. The LM 
'u. subsystem determines where the new physical request should be sent to. 

(j [248] Information kept in tables on the e405/lc440 (virtual server card) does not have to be 
fl duplicated in its entirety on the SP, since the SP only handles a small subset of commands and 
y 1 because of the leaf size limitation on the TSE. Many of the byte fields and half word fields have 

□ been merged to 32 bit words in order to save cycles when accessing the tree search memory. 
Si The word fields will then have to be decomposed by picocode. This is faster since each pico 

thread has its own register set. With TS memory, there is contention from the other threads. 

ill [249] The HLUN structure ties the server with a VLUN. The HLUN entry contains a 

VLUNkey, SERVERkey. If the Tree Search lookup does not yield a leaf, this means that the 
server is not assigned to see the LUN requested. The key fed in to the TSE to yield a HLUN is 
the source port of the command, the FCLUN, and the DevHandle from the message header. 

25 [250] A HLUN is a binding between a server LUN and a VSX VLUN. The Key for server 
pdevpath is used to look up the server structure. The key for VLUN is used to look up the 
VLUN structure. 

[251] HLUN Leaf Structure is given as follows 
[252] 

30 

hlun STRUCT 

vlunKey word ;key used to look for the VLUN 

initiatorKey word ;key used to look for the server pdevpath 
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flags 

f caPortNpPort 
linkCB 



word ;ua29, lock/ zoning, etc . 

byte ;source f caPort (2b) , npPort (6b) 

word /Address of the link control block 



5 ENDS 

[253] The structure shown above is what is stored as leaf data in the SP tree search memory. 
This leaf is found through a search of DevHandle, command FCLUN and port number. The 
vlunKey field is used as the key to search for the VLUN leaf. The initiatorKey field is used as 
1 0 the key to search for the initiator PDEVPATH leaf. The intent of the flags field is used to 
indicate reservations, zoning and ua29. The fcaPortNpPort field is the source FCASIC port 
identifier (upper 2 bits) and the source SP port (lower 6 bits, in DDpppp) format of where the 
request came from. 

[254] The VLUN leaf contains information about a VLUN together with the composition of it. 
IS A VLUN can be made up of sections of a PLUN. This section is known as a slice. 
2 [255] The VLUN is a structure that describes what the VLUN is composed of The VLUN 
: y contains the following features. LUN typecan be a virtual VSX device or a native device. State 
, J indicates the state of the VLUN. Total Blocks indicates the number of blocks on the VLUN. 

l D Block Size indicates the number of bytes/block. 

31 

20 [256] The VLUN also contains information about that slice. A VLUN can include many 
J; PLUNs. Each component is referred to as a slice. The slice information kept includes the 
i-* following. SLICEJEND is the end of a slice with respect to the VLUN. SLICEJ3FFSET is the 
□ offset within the PLUN. SLICE_BLKS is the number of blocks within the slice. PLUN_KEY is 
1 "U a key to search for the PLUN; the key is with respect to the slice. 

25 [257] The slices are kept as part of the VLUN structure. The picocode walks through the slices 
to determine which PLUN the IO goes to. With this, there may only be room for up to 3 slices. 
[258] Once a VLUN leaf is yielded from a Tree Search, the picocode will walk the slices to see 
which slices are involved in the request. Once the correct slice is identified, LM will use the 
sliceOffset to calculate the new start LBA of the request and update the wks. Requests that cross 

30 slice boundaries may be handled, and the LM may also calculate the requested blocks. 

[259] At the same time, a search for the PLUN is started using the pLunKey in the slice. This 
will yield a PLUN leaf. 

[260] The LPM search mechanism with Roping may be used, decoupling the slices from the 
VLUN. The search into the slices will use a VLUN key with the command start block address, 
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yielding a leaf in the slice table. Picocode will then go to the next slice by walking the next 
element address in the leaf, with linking is provided by the Roping services. 
[261] The VLUN Leaf Structure is as follows. 
[262] 

5 vlun STRUCT 
vType 

scsiDevType 
state 
totalBlks 
10 blkSize 

;/* Slice 0 */ 
sliceOEnd 
sliceOOf f set 
15 sliceOBlks 
plunKeyO 

;/* Slice 1 */ 
h* slicelEnd 
2CB slicelOffset 
Q slicelBlks 
! y. plunKeyl 

: y 

^ ;/* Slice 2 */ 

2$!' slice2End 

'J? slice20ffset 

{ ** slice2Blks 

5S plunKey2 
□ ENDS 

m 

i«* [263] The structure shown above is what is stored as leaf data in the SP tree search memory. 
2 This leaf is found from the vlunKey found in the HLUN. The vType field identifies the whether 
!"W the VLUN is a native device, concatenation, partition, mirror or stripe. The scsiDevType 

identifies whether the device is DISK, TAPE, SACL, etc. The state field tells the state of the 
35 VLUN, with a zero value specifying that it is operational. The totalBlks field specifies the total 

capacity of the VLUN, and this field is used by picocode to check the request bounds. The 

blkSize field is the bytes/block for the VLUN, and can be used to calculate the number of bytes 

of a request. 

[264] There are three slices in a VLUN, allowing a VLUN to be constructed out of three 
40 physical devices. Fields in a single slice are as follows. 

[265] The sliceEnd field is the ending block number of the VLUN in the slice. The sliceOffset 
field is the offset into the PLUN in the slice. The sliceBlks field is the number of blocks in the 
slice. The plunKey field is used to search for the PLUN the slice is associated with. 



byte 
byte 
byte 
word 
word 



virtual device type 
SCSI device type 
state of this vlun 
total blks in vlun 
blk size in vlun 



- stripe/mirror/native 



word /offset within VLUN 

word /offset within the PLUN 

word ;blks in this slice 

hword ;key of the plun 



word 
word 
word 
hword 



offset within VLUN 
offset within the PLUN 
blks in this slice 
key of the plun 



word /offset within VLUN 

word /offset within the PLUN 

word /blks in this slice 

hword /key of the plun 
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[266] The PLUNup table is used on the upstream SP to look for the PLUN. There is a 
PLUNdown table that is used by the downstream SP. The PLUNdown table contains smaller 
leaf sizes. 

[267] The PLUN leaf contains the following information. The LunNumber is the physical lun 
5 number. The Block Size is the bytes/block for the physical LUN. The Target DMU is a field 
which specifies which DMU to send this request downstream, which matters since there are two 
egress datastores. Regarding DSO/1, DSO is connected to DMU A/B and DS1 is connected to 
DMU C/D. The Target DS field specifies which DS on the egress side to send the request to. 
The Target Blade field specifies the target blade number of the request. 
1 0 [268] The PLUN leaf also contains the Downstream LID, which is a key used by the 

downstream SP to search for the PLUN or PDEVPATH. The MSB specifies whether the key is 
u used to search for a PLUN or PDEVPATH. If the downstream SP has multiple paths to the 
O device, the key is used to search for the PLUN, otherwise it is used to search for the 
3 PDEVPATH. 

U [269] The LM will search for a PLUN leaf using the plunKey in the VLUN leaf. From the leaf, 

y i 

<0 LM may update a register with the physical fclun field, update the FCBpage TB field after 

: S3K; 

choosing a path, update the FCBpage target DMU/DSU fields, and update the Ethernet 
2 encapsulation header LLC field with the PathLID. 
[270] The PLUN Leaf Structure is as follows. 




25 



blkSize 
pref Path 



hword ;lun number within the physical device 
word ; total blks in this lun 
word ;blk size of this lun 
byte /preferred path to take 



;/* Path 0-10 
pathOSt 

pathO PortDmuDsu 

pathOBladeQid 

pathOLid 



bytes */ 



30 



byte ; State (4b) , rsv (3 ) ,prio (lb) 

byte ; Port (lb) , rsv (lb) , dmu (2b) , dsu (4b) 

word ; Blade (2B) , rsv (5b) , QID(lOb) 

word ; lookup id for downstream PLUN/PDP. 



;/* Path 1-10 
pathlSt 

pathlPortDmuDsu 

pathlBladeQid 

pathlLid 



bytes */ 



35 



byte 
byte 
word 
word 



;/* Path 2-10 
path2St 



bytes */ 



40 



byte 
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pat h2 Por tDmuDsu by t e 

path2BladeQid word 
path2Lid word 

ENDS 

5 

[272] The lunNum field is the LU number of the LU behind the port. The totalBlks is the total 
blocks in the LU. The blkSize is the block size of the LU. The prefPath field is an indicator of 
which path to use, and is a static path selection. If a path needs to be changed, the SMS will 
update the field. The pathSt field is used to indicate the state of the path. The pathPortDmuDsu 

10 is used to indicate the target blade DMU and DSU, and is used when programming the FCBpage 
registers. 

[273] The bladeQid field is a concatenation of the target blade and the source QID. The QID is 
programmed into the source routing information, and may be used to program into the FCBpage 
u when responses come back into the egress side. 

11 [274] The pathLid field is used as a lookup for the downstream SP. In the pathLid, the MSbit 
\f indicates whether there are multiple paths to the device downstream. If the MSbit is clear, there 
If is only a single path. The pathLid will then be used to lookup for a pdevpath downstream. If the 
g MSbit is set, the lookup will be for a PLUN. 

[275] On the downstream side, the LM will look into the LLC field of the Ethernet 
3 encapsulation header and extract the LID. The LED can be used to search for either the 
K PLUNdown leaf or the pDevPath leaf directly. If there are multiple paths to the PLUN on the 
|| downstream SP, the LCD will have the MPATH bit set. The LID will then be used as a key to the 
W TSE to search the PLUNdown tree for a leaf. If the MPATH bit is clear, then there is only a 

single path and the LID will be used to search the pDevPath tree directly. 
25 [276] The PLUNdown leaf contains the following. The prefPath. is the preferred path to use. 

The pathState is the state of a particular path. The pathKey is used to search for the pDevPath 

leaf. LM will choose a path using the prefPath and pathState fields and start a search on the 

pDevPath tree. 

[277] The PLUNdown Leaf Structure is as follows. 
30 [278] 

plunDown STRUCT 

prefPath byte ;preffered path to take 

;/* Path 0 */ 

35 pathOState byte /state of this path 

pathOKey word ;key of the pDevPath 



35 



;/* Path 1 
pathlState 
pathlKey 

5 

;/* Path 2 
path2 State 
path2Key 

ENDS 

10 

[279] The PLUNdown structure is used on the downstream side. The prefPath structure is used 
to select 3 possible paths to a PLUN. The pathState field indicates the state of a path. The 
pathKey is used as a key to look for the pdevpath leaf. 

[280] A pdevpath is a structure that can represents a physical connection to a storage or server, 
15 but does not represent LUNs behind the physical storage. A pedevpath contains the following. 
[281] FCJD is the server or storage FC id. The MaxRxData field shows the maximum frame 
size the storage/server can receive. The Bbcredit field is the number of BB credits the 
Q server/storage has given during the LOGIN process. Port is the port number on the SP which the 
y server/storage is attached. 

2£f [282] A pDevPath leaf can represent a server or a path to a storage device. A key to the server 
ffl pDevPath comes from a field in the HLUN leaf. The key to the device pDevPath comes from 
q the LID in the Ethernet encapsulation header on the downstream SP. 

[283] The pDevPath Leaf Structure is as follows. 
W [284] 

,.SK. 

25 : pDevPath STRUCT 

jy portHandle word ;FCASIC portHandle 

port byte ; SP port number 

fcaPort byte ; FC ASIC port number 

ENDS 

30 

[285] The portHandle field is a handle to the physical device that is known to the FCASIC. 
When picocode performs 10 to a physical device, it passes this handle down to the FCASIC for it 
to identify the device. The port field is the SP port number in DDpppp format. The fcaPort field 
is the FC ASIC port identity. 
35 [286] A port structure contains information about our own SP port. It contains information such 
as FCID, which is used by the FCP code. The port structures are in tree search memory. Since 
there are only a small number of ports on the SP, the lookup is done using an index into an array 
to find the port CB address. This should be faster than using the TS engine. 



byte /state of this path 
word ;key of the pDevPath 



byte ; state of this path 
word ;key of the pDevPath 
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[287] The following paragraphs describe how the LM tables in the TSE become populated. 
[2881 The LM tables in the TSE get populated from the LM in the virtual server card. The LM 
in the VSC has similar structures to that used by picocode. The difference between them is that 
the picocode structures are more compact and integrated. 
5 [289] Pdevpath leafs (see structure below) exist for a physical device on the SP on which it is 
attached, and thus a pdevpath leaf will be programmed on the SP where a storage device or 
initiator is attached. 

[290] The pdevpath fields in the leaf are filled in entirely from the PDEVPATH structure in the 
VSC. 
10 [291] 



typedef struct 
{ 

U32 fcaDevHandle PKD_ 

U8 npPort PKD_ 

U8 fcaPortld PKD_ 

} picoPdpT PKD_ 



/* FCASIC device handle */ 

/* PPPP DD format */ 

/* FCA port identity (2b) */ 



[292] The fcaDevHandle is filled in from the VSC (also referred to as the e405) 
pdevpath.fcaDevHandle. This field was given to the e405 when a new device was found by the 
2P FC ASIC, and is a handle to the device used by the FC ASIC. 

q [293] The npPort is filled in from the e405 pdevpath.npPort. This field has 2 elements, port 
I y and DMU, and was given to the e405 when a new device was found. The npPort field indicates 
Up which DMU the device is attached to. Since the SP is operating in POS format, the port number 
1 is 0. 

: r ss? 

25 [294] The fcaPortld is filled in from the e405 pdevpath. fcPortld. It is an identity of the FC 
ASIC port on which the device was discovered, and is given to the e405 when a "New Device 
Report" is sent. The key used to program the pdevpath leaf is the pdevpath.pdHandle. 
[295] The PLUNUP Leaf (see structure below) exists on the SP where there is VLUN exported 
to a host, and is used by the SP to find where to ship the frame downstream. The lunNum is 

30 filled directly from the e405 plun.lunNum field, and is the LU number behind the physical 

device. The totalBlks is filled from the e405 plun.blkCount field. The blkSize is filled from the 
e405 plun.blkSize. 

[296] The PLUNUP leaf contains 3 paths to the downstream SP, similar to the array of 
pdevpath pointers in the e405 PLUN structure. The prefPath field instructs picocode to use a 
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particular path. The configuration sw will look at the plun.preferred path to fill in the correct 
index in the leaf. 

[297] The pathSt field is used to indicate the state of a path. It is filled from the e405 
plun.path. state field. The e405 goes the pdevpath structure from the PLUN to get this field. 
5 [298] The pathPortDmuDsu is a combination of the downstream FCASIC portld, target 
downstream DMU, and DSU, and is filled in from the plun.path.fcaPortld and the 
plun.path.bladePort fields. The configuration software can determine the DSU from the 
plun.path.bladePort field. The DMU/DSU fields have to be determined in advance because the 
FCBpage is filled in with these target parameters. 

1 0 [299] The bladeQid field is a combination of the target downstream blade number and the QED 
parameter. The QID parameter is for the scheduler, and is filled in from the plun.path.bladeld. 
The bladeLid field is used as a lookup on the downstream SP to find either the PLUNDown or 

0 PDEVPATH leaf, and is filled in from the plun.path.bladeLid field. 
[300] The key used to program this leaf is the plun.plHandle. 

U [301] 

.« typedef struct 

:= { 

U16 lunNum PKD_ 

L U32 totalBlks PKD_ 

20 U32 blkSize PKD_ 

III U8 prefPath PKD_ 

jjj /* Path 0 */ 

n U8 pathOSt PKD_ 

25 U8 pathOPortDmuDsu PKD_ 

; ' y U32 pathOBladeQid PKD_ 

U32 pathOLid PKD_ 

/* Path 1 */ 

30 U8 pathlSt PKD_ 

U8 pathlPortDmuDsu PKD_ 

U32 pathlBladeQid PKD_ 

U32 pathlLid PKD_ 

35 /* Path 2 */ 

U8 path2St PKD_ 

U8 path2PortDmuDsu PKD_ 

U32 path2BladeQid PKD_ 

U32 path2Lid PKD_ 

40 } picoPlunUpT _ PKD _ 

[302] VLUN leafs (see structure below) are programmed in to the SP where there is a host with 
the VLUN exported. The vtype field is filled in from the e405 vlun.type field. The scsiDevType 
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field is filled in from the e405 vlun.devType field. The state is filled in from the e405 vlun.state 
field. The totalBlks and blkSize are filled in from the e405 vlun.totalBlks and vlun.blkSize 
fields. 

[303] The vlun can be created out of 3 slices. The sliceEnd field is the ending virtual block in 
the slice, and is filled from the e405 vlun.slice.vlunEnd. The sliceOffset field is the offset into 
the PLUN, and is filled in from the e405 vlun.slice.plunOffset. The sliceBlks field is the number 
of blocks in the slice, and is filled in from the e405 vlun.slice.blkCount. The plunKey field is 
used as the key for looking up the PLUN, and is filled in from the e405 vlun.slice.dev.handle. 
[304] The key used to program this leaf is the vlun.handle. 
[305] 



typedef struct 
{ 



U8 

U8 

U8 

U32 

U32 



vType 

scsiDevType 
state 
totalBlks 
blkSize 



/* Slice 0 */ 
U32 sliceOEnd 
U32 sliceOOffset 
U32 sliceOBlks 
U16 plunKeyO 
/* Slice 1 */ 
U32 slicelEnd 
U32 slicelOffset 
U32 slicelBlks 
U16 plunKeyl 
/* Slice 2 */ 
U32 slice2End 
U32 slice20ffset 
U32 slice2Blks 
U16 plunKey2 
} picoVlunT 



_PKD_ 
_PKD_ 
_PKD_ 
~PKD_ 
_PKD_ 

_PKD_ 
_PKD_ 
_PKD_ 
~PKD_ 

_PKD_ 
_PKD_ 
_PKD_ 
_PKD_ 

_PKD_ 
~PKD_ 
~PKD_ 
_PKD__ 
PKD 



[306] The HLUN leaf (see structure below) is programmed into the SP where there is a VLUN 
exported to a host. The vlunKey is used to look up the VLUN leaf, and is filled in from e405 
hlun.vLun.handle field. The initiatorKey is used to look up the host pdevpath leaf, and is filled 
in from the e405 hlun.src.pdHandle field. 

[307] The fcaPortDmuDsu is used as the source fcaPort, DMU and DSU fields, and is taken 
from the hlun.src.fcaPortld and hlun.npPort, which indicates the DMU. The DSU field is figured 
out from the DMU. 
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[308] The eHandle field is a handle to the e405 HLUN and will be passed back to the e405 

when a proxy command comes in to provide a fast lookup to the HLUN structure. 

[309] The key used to program the leaf is based on the FCAPORTID, DevHandle, and FCLUN. 

[310] 

5 typedef struct 
{ 

U32 vlunKey PKD ; 

U32 initiatorKey PKD ; 

U32 flags _PKD ; 

10 U8 fcaPortDmuDsu PKD ; 

U3 2 eHandl e PKD ; 

U32 linkCB PKD ; 
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} picoHlunT PKD_; 

[311] The PLUNDown leaf (see structure below) is programmed onto the SP where the storage 
|^ device is connected. The prefPath field is used to indicate which path index to use when sending 
J* the frame out, and is filled in from the plun.prefferedPath field. 

\| [312] There are 3 paths to choose from. The pathState field is used to indicate the state of the 
2$ path. It is filled in from the e405 plun.path.state. The pathKey is filled in from the e405 
;*f plun.path.pdHandle. 

r [3i3] 



prefPath PKD ; 

/ 

pathOState PKD ; 

pathOKey PKD ; 

/ 

pathlState PKD ; 

pathlKey "_^KD ; 

/ 

path2 State PKD ; 

path2Key PKD ; 

} picoPlunDownT PKD ; 

[314] The storage server 100 implements various public interfaces, including lm_cmd_i and 
lni_cmd_e, as follows. 

[315] The lm cmd i walks the VLUN structure to calculate the new starting LBA and number 
of blocks for the request. It will pick a path in the case where the PLUN is connected through 
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typedef struct 




{ 






U8 






/* Path 0 






U8 






U32 


30 










/* Path 1 






U8 






U32 


35 




/* Path 2 






U8 






U32 



40 
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multiple paths. The path is UPSTREAM INGRESS COMMAND, and it is called by SM 
upstream after it starts the TSE to check for existence of a HLUN. 

[316] The following data is used. IoCB should be in scratchl shared memory. Tree search 
results for a HLUN should be in TSRO area. The frame should be in datapool. The W24 should 
5 have the LB A, w26 should have the number of blocks. 

[317] The R20 is the RDWR flag. 0=RDWR l=NOT RDWR command. LM will not modify 
startLBA and reqBlks if not RDWR command. 

[318] The following data is modified. Iocb.hpLun will have the leaf address of the hlun. 

PlunHandle is used for downstream lookup will be inserted into the TAGS porthandle field - 
10 W24 will have the physical LBA and W26 the physical number of blocks. FCBpage TDMU 

register are updated. FCBpage DSU register are updated. 
,U I 3 * 9 ! FCBpage TB registers are updated with the target blade. TAGS.src.TB are modified with 

•asst. 

;j TB of this SP. TAGS.src.QED are modified with the target port used for enqueing at the 
SI upstream side. TAGS.src.FCAport are modified with the upstream FCASIC port identifier. 

TAGS.src.DMU are modified with the upstream DMU used to return data to initiator. 
;0 TAGS.src.DSU are modified with the upstream target DS unit used in order to return data to 
» initiator. 

|r{ [320] IoCB.riTblade will be filled with the target blade. IoCB.riTqid are filled with the target 
h* QID. IoCB.riPortDmuDsu are filled with the target port, DMU, DSU. 

2§| [321] Return data is as follows R20 - status as defined in vsxstat.inc, R2 1 - FCLUN, Rl 8 - 0 
j V if VLUN is native, 1 if VLUN is NOT native, W24 - new startLBA, and W26 - new ReqBlks. 
[322] The lm_cmd_e is used to pick a path to the physical device, as is done from the 
plunHandle passed in the packet LLC field. The path is DOWNSTREAM EGRESS 
COMMAND, and it is called by FCP downstream after receiving a command packet. The 

25 command uses various inputs including IoCB stored in scratchl shared memory. 

[323] Modified data includes TAGS.dst.TB modified with the destination target blade, 
TAGS.dst.QID modified with the target port used for enqueing at the downstream side, if known. 
Other modified data includes TAGS.dst.FCAport modified with the downstream FCASIC port 
identifier, if known, TAGS.dst.DMU modified with destination target DMU, TAGS.dst.DSU 

30 modified with destination target DSU, and IoCB.tgtPort will have the SP port number connected 
to the device. 
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[324] Further modified data includes IoCB.maxRxData will have the maximum data the device 
can receive, IoCB.hpLun will have the leaf address of the plun, and IoCB.prefPath will have the 
preffered path picked. 

[325] Return data includes R20 - status as defined in vsxstat.inc, R21 - maxRxdata of device, 
5 and Rl 5 [ 1 ] - output port. 

[326] In operation, the code will extract the handle passed in from the upstream SP in the 
Ethernet LLC header field. If the handle has the multipath bit set, the handle will be used to 
search in the PLUN tree. From the PLUN leaf, a path will be selected. Each path in the PLUN 
leaf has a key. The key will be used to search through the PDEVPATH table. The PDEVPATH 
10 leaf will have the device information. Inside the PDEVPATH, the port will be used to search for 
the FCPORT structure, which is another PDEVPATH leaf. 

[327] In the case where the multipath bit is NOT set, there is only a single path to the device, 
p The key is used to look directly into the PDEVPATH table. This provides the device 
~ PDEVPATH leaf The search for the FCPORT structure is still performed. 
1 : SJ [328] Although the above description has focused on specific embodiments, various 
gj* alternatives and equivalents would be within the understanding of one of ordinary skill in the art. 
y ! Therefore, the invention is to be defined with reference to the following claims and their 
□ equivalents. 

HLf 
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